May 5, 2017
Written by Katharine Pittman
Contact: klp398@nyu.edu
Since 2011, the advocation and utilization of renewable energy has gained exponential momentum as the effect of climate change and global warming has become increasingly acknowledged. However, state legislation is the primary factor as to where renewable energy is implemented. Large energy companies that provide renewable energy options, such as SunPower and NRG, only provide these services in certain states. At the moment these states include: CA, CO, GA, MA, MD, MN, NJ, NM, PA, RI, and VT. These states will be referred to as the "Key 11 States" throughout this project.
The object of this project is to investigate which states should be targeted next based on various avenues.
In [543]:
import sys # system module
import pandas as pd # data package
import matplotlib.pyplot as plt # graphics module
import datetime as dt # date and time module
import numpy as np # foundation for Pandas
%matplotlib inline
# check versions
print('Python version:', sys.version)
print('Pandas version: ', pd.__version__)
print('Today: ', dt.date.today())
One of my main priorities for selecting a data set was that it contained all of the information I needed for that particular part of my project, and I wouldn't have to "piece together" different files to create a graph.
The U.S. Energy Information Administration (https://www.eia.gov/) has an innumerable amount of data, some files more detailed, while others are more holistic in nature.
For the first part of my project, I am looking at the electricity prices for each state in 2015 for the Residential sector. From this I will identify the Top 5 Most Expensive States in terms of electricity prices that are NOT part of the Key 11 States, and then look at the average price of electricity per state from 2000 to 2015 for these Top 5 States.
In [544]:
url = 'https://www.eia.gov/electricity/data/state/avgprice_annual.xlsx' # importing the data
price = pd.read_excel(url)
price.head() # looking at the beginning of the data
Out[544]:
In [545]:
price.tail() # looking at the end of the data
Out[545]:
After looking at the head of the data, I don't need the first row and I only need the first four columns, since I am only interested in prices for Residential. After looking at the tail of the data, I don't need any of the data prior to 2000, since I am focusing on 2000-2015.
After looking at the data from the url in full, I also am only interested in the Full-Service Provider prices for each state, so I will pull only this data from the dataset.
In [546]:
url = 'https://www.eia.gov/electricity/data/state/avgprice_annual.xlsx' # importing our data
price = pd.read_excel(url,
skiprows=1, # skipping the first row
usecols=range(4), # only interested in the first four columns
skipfooter=1126) # skipping data for years 1990-1999
sector = ['Full-Service Providers'] # specifying what sector I want
price.isin(sector)
cleaned_price = price[price['Industry Sector Category'].isin(sector)]
print(cleaned_price)
In [547]:
# Inspecting the cleaned up data
print('Dimensions: ', cleaned_price.shape, '\n')
print('Column labels: ', cleaned_price.columns, '\n')
print('Row labels: ', cleaned_price.index, '\n')
print('Type: ', cleaned_price.dtypes, '\n') # '\n' adds a blank line and makes the output easier to read
In [548]:
year = [2015] # specifying the year I want
cleaned_price.isin(year)
price2015 = cleaned_price[cleaned_price['Year'].isin(year)]
price2015_sorted = price2015.sort_values('Residential') # sorts the prices in ascending order
print(price2015_sorted)
In [549]:
# Showing the prices of electricity from 2000-2015 for just one state as a reference
state = ['HI'] # specifying what state I want
cleaned_price.isin(state)
HI_price = cleaned_price[cleaned_price['State'].isin(state)].iloc[::-1] # reverses the order
# of the rows
print(HI_price) # showing how the data looks
In [550]:
# Showing a graph for just one state as a reference
plt.figure(figsize = (16,7))
plt.plot(HI_price['Year'], HI_price['Residential'], color = 'red')
plt.suptitle('Price of Electricity in HI by Year', fontsize=18)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Residential', fontsize=12)
Out[550]:
In [551]:
# Pulling the price by year data for the additional four states I'm interested in
state = ['CT'] # specifying what state I want
cleaned_price.isin(state)
CT_price = cleaned_price[cleaned_price['State'].isin(state)].iloc[::-1] # reverses the order
state = ['AK'] # specifying what state I want
cleaned_price.isin(state)
AK_price = cleaned_price[cleaned_price['State'].isin(state)].iloc[::-1] # reverses the order
state = ['NH'] # specifying what state I want
cleaned_price.isin(state)
NH_price = cleaned_price[cleaned_price['State'].isin(state)].iloc[::-1] # reverses the order
state = ['NY'] # specifying what state I want
cleaned_price.isin(state)
NY_price = cleaned_price[cleaned_price['State'].isin(state)].iloc[::-1] # reverses the order
In [552]:
# Plotting a line graph with data from all five states I'm interested in
plt.figure(figsize = (17,10))
plt.plot(HI_price['Year'], HI_price['Residential'], color = 'red', label = "Hawaii")
plt.plot(CT_price['Year'], CT_price['Residential'], color = 'blue', label = "Connecticut")
plt.plot(AK_price['Year'], AK_price['Residential'], color = 'green', label = "Arkansas")
plt.plot(NH_price['Year'], NH_price['Residential'], color = 'purple', label = "New Hampshire")
plt.plot(NY_price['Year'], NY_price['Residential'], color = 'cyan', label = "New York")
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.suptitle('Price of Electricity in HI, CT, AK, NH, & NY by Year', fontsize=18)
plt.xlabel('Year', fontsize=12)
plt.ylabel('Residential', fontsize=12)
plt.show()
Besides cost, another avenue to investigate which states should consider adopting renewable energy is coal consumption. Data Set #2 will be looking at the Consumption of Coal for Electricity Generation by State in 2014 and 2015 for All Sectors. The Top 5 Coal Consuming states will be identified and graphically represented.
In [553]:
url2 = 'https://www.eia.gov/electricity/annual/xls/epa_05_09.xlsx' # importing our data
coal_consumption = pd.read_excel(url2)
coal_consumption.head() # looking at the beginning of the data
Out[553]:
In [554]:
coal_consumption.tail() # looking at the end of the data
Out[554]:
Data Set #2 lists the amount of coal consumed (in thousand tons) for electricity generation for each state, however it also divides the states into regions and provides the region total of coal consumed. As I am only focusing on the indvidiual states (and D.C.), I will have to specify that I just want the data for these rows.
In [555]:
url2 = 'https://www.eia.gov/electricity/annual/xls/epa_05_09.xlsx' # importing our data
coal_consumption = pd.read_excel(url2,
skiprows=4,
usecols=range(3),
skipfooter=2)
coal_consumption.columns = ['State', '2015', '2014']
print(coal_consumption)
In [556]:
# Inspecting the cleaned up data
print('Dimensions: ', coal_consumption.shape, '\n')
print('Column labels: ', coal_consumption.columns, '\n')
print('Row labels: ', coal_consumption.index, '\n')
print('Type: ', coal_consumption.dtypes, '\n') # '\n' adds a blank line and makes the output easier to read
In [557]:
# I don't want data for the different regions, so I'm specifying what rows I do want data for
state_list = ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado',
'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois',
'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland',
'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana',
'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York',
'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania',
'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah',
'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming',
'District of Columbia']
coal_consumption.isin(state_list)
state_coal_consumption = coal_consumption[coal_consumption['State'].isin(state_list)].sort_values('2015')
# sorts the values in ascending order, so I can pick out which states have the highest consumption
print(state_coal_consumption)
In [558]:
# Here I can use nlargest to graph the top 5 states, but this is only because one of the Key 11 States
# is not included in the Top 5.
my_plot = state_coal_consumption.nlargest(5, '2015').plot(x = 'State', kind='bar')
The final avenue I will take into addressing which states renewable energy should take over next may actually be the most important - emissions. This data set specifies the Carbon Dioxide, Sulfur Dioxide, and Nitrogen Oxides emitted from energy consumption at conventional power plants and combined-heat-and-power plants for 2014 and 2015. One must keep in mind that identified states from this section are likely similiar to states that consumed a vast amount of coal.
In [559]:
url3 = 'https://www.eia.gov/electricity/annual/xls/epa_09_05.xlsx' # importing our data
emissions = pd.read_excel(url3)
emissions.head() # looking at the beginning of the data
Out[559]:
In [560]:
emissions.tail() # looking at the end of the data
Out[560]:
In [561]:
url3 = 'https://www.eia.gov/electricity/annual/xls/epa_09_05.xlsx' # importing our data
emissions = pd.read_excel(url3,
skiprows=4,
skipfooter=2)
emissions.columns = ['State', 'CO2 2015', 'CO2 2014', 'SO2 2015', 'SO2 2014', 'NOx 2015', 'NOx 2014']
# renaming the columns
emissions['CO2 Total'] = emissions['CO2 2015'] + emissions['CO2 2014'] # adding CO2 emissions for 2014 + 2015
emissions['SO2 Total'] = emissions['SO2 2015'] + emissions['SO2 2014'] # adding SO2 emissions for 2014 + 2015
emissions['NOx Total'] = emissions['NOx 2015'] + emissions['NOx 2014'] # adding NOx emissions for 2014 + 2015
totals = ['State','CO2 Total', 'SO2 Total', 'NOx Total']
small_emissions = state_emissions[totals] # creating a new dataframe with just the total emissions
small_emissions = small_emissions.set_index(['State']) # setting the index to the State name
print(small_emissions) # checking to make sure it looks good
In [562]:
# Inspecting the cleaned up data
print('Dimensions: ', small_emissions.shape, '\n')
print('Column labels: ', small_emissions.columns, '\n')
print('Row labels: ', small_emissions.index, '\n')
print('Type: ', small_emissions.dtypes, '\n') # '\n' adds a blank line and makes the output easier to read
In [563]:
print(small_emissions.sort_values('CO2 Total'))
In [564]:
print(small_emissions.sort_values('SO2 Total'))
In [565]:
print(small_emissions.sort_values('NOx Total'))
In [566]:
CO2_states = ['Texas', 'Florida','Indiana','Ohio','Illinois']
CO2_small = small_emissions.T[CO2_states] # creating a new dataframe with just the total emissions and
# translating the dataframe so the State names will be in the Index
print(CO2_small.T['CO2 Total']) # only want to show the CO2 Totals because the SO2 and NOx are below
CO2_plot = CO2_small.T['CO2 Total'].plot(kind='bar') # translating it again so I can specify
# that I only want 'CO2 Total' in the plot
In [567]:
SO2_NOx_states = ['Texas','Indiana','Ohio','Florida','Kentucky','Illinois']
intermediate = small_emissions.T[SO2_NOx_states] # creating a new dataframe with just the total emissions and
# translating the dataframe so the State names will be in the Index
SO2_NOx = ['SO2 Total', 'NOx Total'] # specifying that I just want these two columns for my new df, not CO2
final_SO2_NOx = intermediate.T[SO2_NOx] #creating a new dataframe with just the SO2 Total and NOx Total and
# translating again so the SO2 Total and NOx Total will be in the Index
print(final_SO2_NOx)
final_SO2_NOx.plot(kind='bar')
Out[567]:
The three avenues I investigated to see which states renewable energy should expand to next included 1) Electricity Cost, 2) Coal Consumption, and 3) CO2, SO2, and NOx Emissions.
Not including the 11 Key States....
Unfortunately, there was not a single state that appeared in all 5 categories. It seems the states with the highest consumptions and emissions have lower electricity prices.
Based on this data, the next states that I believe would have the most profound effect and should consider implementing renewable energy projects are Texas and Indiana.